We’re part of an academic community at Warwick. Whether studying, teaching, or researching, we are all taking part in an expert conversation which must meet standards of academic integrity. When we all meet these standards, we can take pride in our own academic achievements, as individuals and as an academic community.
Academic integrity means committing to honesty in academic work, giving credit where we’ve used others’ ideas and being proud of our own achievements.
In submitting my work, I confirm that:
I have read the guidance on academic integrity provided in the Student Handbook and understand the University regulations in relation to Academic Integrity. I am aware of the potential consequences of Academic Misconduct. I declare that the work is all my own, except where I have stated otherwise. No substantial part(s) of the work submitted here has also been submitted by me in other credit bearing assessments courses of study (other than in certain cases of a re-submission of a piece of work), and I acknowledge that if this has been done this may lead to an appropriate sanction. Where a generative Artificial Intelligence such as ChatGPT has been used I confirm I have abided by both the University guidance and specific requirements as set out in the Student Handbook and the Assessment brief. I have clearly acknowledged the use of any generative Artificial Intelligence in my submission, my reasoning for using it and which generative AI (or AIs) I have used. Except where indicated the work is otherwise entirely my own. I understand that should this piece of work raise concerns requiring investigation in relation to any of points above, it is possible that other work I have submitted for assessment will be checked, even if marks (provisional or confirmed) have been published. Where a proof-reader, paid or unpaid was used, I confirm that the proof-reader was made aware of and has complied with the University’s proofreading policy.
| Variable | Description |
|---|---|
| date | date of the day the bikes were rented |
| hires | number of bikes rented on the particular date |
| wfh | policy to work from home. 1 indicates the policy was implemented on the particular date, 0 indicates the policy were not implemented on the particular date |
| rule_of_6_indoors | policy to regulate only up to six people from any number of different households were allowed to meet outside. 1 indicates the policy was implemented on the particular date, 0 indicates the policy were not implemented on the particular date |
| eat_out_to_help_out | policy to allow diners to receive a 50% discount on meals in restaurants. 1 indicates the policy was implemented on the particular date, 0 indicates the policy were not implemented on the particular date |
| day | Day of the particular row |
| month | Month of the particular row |
| year | Year of the particular row |
# Load the dataset
bike_data <- read_csv("London_COVID_bikes.csv")
## Rows: 4812 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): day, month
## dbl (12): Hires, schools_closed, pubs_closed, shops_closed, eating_places_c...
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Check Data Structure
str(bike_data)
## spc_tbl_ [4,812 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ date : Date[1:4812], format: "2010-07-30" "2010-07-31" ...
## $ Hires : num [1:4812] 6897 5564 4303 6642 7966 ...
## $ schools_closed : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ pubs_closed : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ shops_closed : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ eating_places_closed : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ stay_at_home : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ household_mixing_indoors_banned: num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ wfh : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ rule_of_6_indoors : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ curfew : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ eat_out_to_help_out : num [1:4812] 0 0 0 0 0 0 0 0 0 0 ...
## $ day : chr [1:4812] "Fri" "Sat" "Sun" "Mon" ...
## $ month : chr [1:4812] "Jul" "Jul" "Aug" "Aug" ...
## $ year : num [1:4812] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## - attr(*, "spec")=
## .. cols(
## .. date = col_date(format = ""),
## .. Hires = col_double(),
## .. schools_closed = col_double(),
## .. pubs_closed = col_double(),
## .. shops_closed = col_double(),
## .. eating_places_closed = col_double(),
## .. stay_at_home = col_double(),
## .. household_mixing_indoors_banned = col_double(),
## .. wfh = col_double(),
## .. rule_of_6_indoors = col_double(),
## .. curfew = col_double(),
## .. eat_out_to_help_out = col_double(),
## .. day = col_character(),
## .. month = col_character(),
## .. year = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
head(bike_data)
## # A tibble: 6 × 15
## date Hires schools_closed pubs_closed shops_closed eating_places_closed
## <date> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2010-07-30 6897 0 0 0 0
## 2 2010-07-31 5564 0 0 0 0
## 3 2010-08-01 4303 0 0 0 0
## 4 2010-08-02 6642 0 0 0 0
## 5 2010-08-03 7966 0 0 0 0
## 6 2010-08-04 7893 0 0 0 0
## # ℹ 9 more variables: stay_at_home <dbl>,
## # household_mixing_indoors_banned <dbl>, wfh <dbl>, rule_of_6_indoors <dbl>,
## # curfew <dbl>, eat_out_to_help_out <dbl>, day <chr>, month <chr>, year <dbl>
summary(bike_data)
## date Hires schools_closed pubs_closed
## Min. :2010-07-30 Min. : 0 Min. :0.00000 Min. :0.00000
## 1st Qu.:2013-11-13 1st Qu.:19776 1st Qu.:0.00000 1st Qu.:0.00000
## Median :2017-02-28 Median :26356 Median :0.00000 Median :0.00000
## Mean :2017-02-28 Mean :26607 Mean :0.02743 Mean :0.05175
## 3rd Qu.:2020-06-15 3rd Qu.:33481 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :2023-09-30 Max. :73094 Max. :1.00000 Max. :1.00000
## shops_closed eating_places_closed stay_at_home
## Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.04634 Mean :0.05175 Mean :0.03616
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000
## household_mixing_indoors_banned wfh rule_of_6_indoors
## Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.06525 Mean :0.2273 Mean :0.01995
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.0000 Max. :1.00000
## curfew eat_out_to_help_out day month
## Min. :0.00000 Min. :0.000000 Length:4812 Length:4812
## 1st Qu.:0.00000 1st Qu.:0.000000 Class :character Class :character
## Median :0.00000 Median :0.000000 Mode :character Mode :character
## Mean :0.01164 Mean :0.005819
## 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :1.00000 Max. :1.000000
## year
## Min. :2010
## 1st Qu.:2013
## Median :2017
## Mean :2017
## 3rd Qu.:2020
## Max. :2023
# Check NA Values
bike_data %>% summarise_all(~ sum(is.na(.x)))
## # A tibble: 1 × 15
## date Hires schools_closed pubs_closed shops_closed eating_places_closed
## <int> <int> <int> <int> <int> <int>
## 1 0 0 0 0 0 0
## # ℹ 9 more variables: stay_at_home <int>,
## # household_mixing_indoors_banned <int>, wfh <int>, rule_of_6_indoors <int>,
## # curfew <int>, eat_out_to_help_out <int>, day <int>, month <int>, year <int>
# There are no NAs
bike_data <- bike_data %>% mutate(day=factor(day, levels=c("Sun", "Mon", "Tue", "Wed", "Thu","Fri", "Sat")))
bike_data <- bike_data %>% mutate(month=factor(month, levels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
bike_data <- bike_data %>% mutate(year=factor(year, levels=c("2010","2011","2012","2013","2014","2015","2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023")))
bike_data <- bike_data %>% mutate(daynum=factor(as.numeric(day, levels=c("Sun", "Mon", "Tue", "Wed", "Thu","Fri", "Sat"))))
bike_data <- bike_data %>% mutate(monthnum=factor(as.numeric(month, levels=c("Jan","Feb","Mar","Apr","May","Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))))
#Split the data only after covid outbreak
bike_data_covid <- bike_data %>%
filter(year %in% c(2020,2021,2022,2023))
ggplot(bike_data_covid, aes(x = Hires)) +
geom_histogram(binwidth = 1200, fill = "darkgreen", color = "black") +
labs(x = "Number of Hired Bikes", y = "Count", title = "Distribution of Hired Bikes During COVID-19") +
theme_minimal() +
theme(plot.caption = element_text(hjust = 0.5))
# Checking the correlation between independent variables
correlation_matrix_bikes <- rcorr(as.matrix(select(bike_data_covid,wfh,rule_of_6_indoors,eat_out_to_help_out,daynum,monthnum,year)), type = "spearman")
print(correlation_matrix_bikes)
## wfh rule_of_6_indoors eat_out_to_help_out daynum monthnum
## wfh 1.00 0.08 -0.29 0 -0.15
## rule_of_6_indoors 0.08 1.00 -0.04 0 0.09
## eat_out_to_help_out -0.29 -0.04 1.00 0 0.08
## daynum 0.00 0.00 0.00 1 0.00
## monthnum -0.15 0.09 0.08 0 1.00
## year 0.40 -0.19 -0.19 0 -0.12
## year
## wfh 0.40
## rule_of_6_indoors -0.19
## eat_out_to_help_out -0.19
## daynum 0.00
## monthnum -0.12
## year 1.00
##
## n= 1370
##
##
## P
## wfh rule_of_6_indoors eat_out_to_help_out daynum
## wfh 0.0027 0.0000 0.8982
## rule_of_6_indoors 0.0027 0.1424 0.9880
## eat_out_to_help_out 0.0000 0.1424 0.9938
## daynum 0.8982 0.9880 0.9938
## monthnum 0.0000 0.0009 0.0021 0.8623
## year 0.0000 0.0000 0.0000 0.9947
## monthnum year
## wfh 0.0000 0.0000
## rule_of_6_indoors 0.0009 0.0000
## eat_out_to_help_out 0.0021 0.0000
## daynum 0.8623 0.9947
## monthnum 0.0000
## year 0.0000
# Perform simple regression analyses to assess the individual effects of the three COVID-19 elements
simple_models <- list(
wfh = lm(Hires ~ wfh, data = bike_data_covid),
rule_of_6 = lm(Hires ~ rule_of_6_indoors, data = bike_data_covid),
eat_out = lm(Hires ~ eat_out_to_help_out, data = bike_data_covid)
)
# Output the summaries and confidence intervals for each model
lapply(simple_models, function(model) {
list(summary = summary(model), confint = confint(model))
})
## $wfh
## $wfh$summary
##
## Call:
## lm(formula = Hires ~ wfh, data = bike_data_covid)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27974 -6682 -632 6371 42196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32263.9 620.6 51.987 < 2e-16 ***
## wfh -4289.7 694.5 -6.177 8.62e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10310 on 1368 degrees of freedom
## Multiple R-squared: 0.02713, Adjusted R-squared: 0.02642
## F-statistic: 38.15 on 1 and 1368 DF, p-value: 8.622e-10
##
##
## $wfh$confint
## 2.5 % 97.5 %
## (Intercept) 31046.402 33481.330
## wfh -5652.068 -2927.248
##
##
## $rule_of_6
## $rule_of_6$summary
##
## Call:
## lm(formula = Hires ~ rule_of_6_indoors, data = bike_data_covid)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28319 -6675 -637 6983 41851
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28319 288 98.323 < 2e-16 ***
## rule_of_6_indoors 7412 1088 6.812 1.43e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10280 on 1368 degrees of freedom
## Multiple R-squared: 0.03281, Adjusted R-squared: 0.03211
## F-statistic: 46.41 on 1 and 1368 DF, p-value: 1.433e-11
##
##
## $rule_of_6$confint
## 2.5 % 97.5 %
## (Intercept) 27753.994 28884.009
## rule_of_6_indoors 5277.854 9546.684
##
##
## $eat_out
## $eat_out$summary
##
## Call:
## lm(formula = Hires ~ eat_out_to_help_out, data = bike_data_covid)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28680 -6862 -675 6927 41490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28680.2 283.8 101.066 < 2e-16 ***
## eat_out_to_help_out 7738.2 1985.0 3.898 0.000102 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10400 on 1368 degrees of freedom
## Multiple R-squared: 0.01099, Adjusted R-squared: 0.01026
## F-statistic: 15.2 on 1 and 1368 DF, p-value: 0.0001015
##
##
## $eat_out$confint
## 2.5 % 97.5 %
## (Intercept) 28123.565 29236.93
## eat_out_to_help_out 3844.237 11632.12
# Linear regression models with time variables
multiple_model_notime <- lm(Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out, data = bike_data_covid)
summary(multiple_model_notime)
##
## Call:
## lm(formula = Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out,
## data = bike_data_covid)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27326 -6575 -592 6290 42844
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31535.0 641.8 49.135 < 2e-16 ***
## wfh -4208.7 711.7 -5.914 4.22e-09 ***
## rule_of_6_indoors 8054.2 1071.8 7.514 1.03e-13 ***
## eat_out_to_help_out 4883.4 2012.4 2.427 0.0154 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10090 on 1366 degrees of freedom
## Multiple R-squared: 0.06921, Adjusted R-squared: 0.06716
## F-statistic: 33.85 on 3 and 1366 DF, p-value: < 2.2e-16
# VIF for the model without time variables
vif_notime <- vif(multiple_model_notime)
print(vif_notime)
## wfh rule_of_6_indoors eat_out_to_help_out
## 1.095928 1.006876 1.090480
# EMMeans for the model without time variables
emm_notime <- emmeans(multiple_model_notime, ~ wfh + rule_of_6_indoors + eat_out_to_help_out)
print(emm_notime)
## wfh rule_of_6_indoors eat_out_to_help_out emmean SE df lower.CL upper.CL
## 0 0 0 31535 642 1366 30276 32794
## 1 0 0 27326 317 1366 26704 27948
## 0 1 0 39589 1219 1366 37197 41981
## 1 1 0 35381 1032 1366 33357 37405
## 0 0 1 36418 1907 1366 32677 40160
## 1 0 1 32210 2036 1366 28216 36203
## 0 1 1 44473 2188 1366 40181 48764
## 1 1 1 40264 2276 1366 35798 44730
##
## Confidence level used: 0.95
# Linear regression models with time variables
multiple_model_time <- lm(Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year + month + day, data = bike_data_covid)
summary(multiple_model_time)
##
## Call:
## lm(formula = Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out +
## year + month + day, data = bike_data_covid)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36678 -4707 353 4588 37023
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21678.1 1001.8 21.639 < 2e-16 ***
## wfh -6195.6 681.1 -9.096 < 2e-16 ***
## rule_of_6_indoors 2271.1 919.9 2.469 0.013678 *
## eat_out_to_help_out -1003.4 1714.4 -0.585 0.558444
## year2021 873.8 581.7 1.502 0.133283
## year2022 5291.9 623.9 8.482 < 2e-16 ***
## year2023 -3642.6 680.3 -5.355 1.01e-07 ***
## monthFeb 2503.8 980.1 2.555 0.010741 *
## monthMar 5149.8 960.8 5.360 9.78e-08 ***
## monthApr 8587.5 980.0 8.763 < 2e-16 ***
## monthMay 14409.0 973.7 14.798 < 2e-16 ***
## monthJun 18811.1 997.0 18.868 < 2e-16 ***
## monthJul 16842.6 968.1 17.398 < 2e-16 ***
## monthAug 13699.7 1034.8 13.238 < 2e-16 ***
## monthSep 12224.2 985.1 12.409 < 2e-16 ***
## monthOct 8321.6 1051.8 7.912 5.24e-15 ***
## monthNov 5402.8 1052.3 5.134 3.25e-07 ***
## monthDec -1822.8 1046.7 -1.742 0.081818 .
## dayMon 119.8 762.3 0.157 0.875116
## dayTue 2826.8 763.5 3.703 0.000222 ***
## dayWed 2733.8 762.5 3.585 0.000349 ***
## dayThu 2914.4 762.5 3.822 0.000138 ***
## dayFri 2034.3 762.4 2.668 0.007716 **
## daySat 3679.6 762.3 4.827 1.55e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7536 on 1346 degrees of freedom
## Multiple R-squared: 0.4886, Adjusted R-squared: 0.4799
## F-statistic: 55.92 on 23 and 1346 DF, p-value: < 2.2e-16
# VIF for the model with time variables
vif_time <- vif(multiple_model_time)
print(vif_time)
## GVIF Df GVIF^(1/(2*Df))
## wfh 1.800483 1 1.341821
## rule_of_6_indoors 1.330255 1 1.153367
## eat_out_to_help_out 1.419508 1 1.191431
## year 1.803536 3 1.103284
## month 1.936388 11 1.030493
## day 1.003467 6 1.000288
# EMMeans for the model with time variables
emm_time <- emmeans(multiple_model_time, ~ wfh + rule_of_6_indoors + eat_out_to_help_out)
print(emm_time)
## wfh rule_of_6_indoors eat_out_to_help_out emmean SE df lower.CL upper.CL
## 0 0 0 33030 584 1346 31884 34177
## 1 0 0 26835 258 1346 26329 27340
## 0 1 0 35302 1155 1346 33036 37567
## 1 1 0 29106 865 1346 27409 30803
## 0 0 1 32027 1685 1346 28721 35333
## 1 0 1 25831 1731 1346 22437 29226
## 0 1 1 34298 1936 1346 30500 38096
## 1 1 1 28103 1896 1346 24384 31821
##
## Results are averaged over the levels of: year, month, day
## Confidence level used: 0.95
# Compare the models with and without time variables using ANOVA
anova(multiple_model_notime, multiple_model_time)
## Analysis of Variance Table
##
## Model 1: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year +
## month + day
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1366 1.3914e+11
## 2 1346 7.6441e+10 20 6.2696e+10 55.199 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Interaction models including time variables
interaction_model_wfh <- lm(Hires ~ wfh * year * month * day, data = bike_data_covid)
interaction_model_rule_of_6 <- lm(Hires ~ rule_of_6_indoors * year * month * day, data = bike_data_covid)
interaction_model_eat_out <- lm(Hires ~ eat_out_to_help_out * year * month * day, data = bike_data_covid)
# Function to extract and display only significant coefficients
display_significant_coefs <- function(model) {
coefs <- summary(model)$coefficients
# Filter coefficients with p-value less than 0.05 (significant at the 5% level)
significant_coefs <- coefs[coefs[, 4] < 0.05, ]
return(significant_coefs)
}
# Extract and display only the significant coefficients from each interaction model
list(
wfh = display_significant_coefs(interaction_model_wfh),
rule_of_6 = display_significant_coefs(interaction_model_rule_of_6),
eat_out = display_significant_coefs(interaction_model_eat_out)
)
## $wfh
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15064.00 3155.150 4.774416 2.063705e-06
## wfh 13319.30 6150.520 2.165557 3.057469e-02
## year2023 -15775.90 5986.476 -2.635256 8.534021e-03
## monthMay 15789.70 5986.476 2.637562 8.476630e-03
## monthJun 17728.45 6150.520 2.882431 4.028425e-03
## monthAug 23736.00 4233.078 5.607268 2.641179e-08
## monthSep 29916.00 4819.571 6.207191 7.815248e-10
## monthOct 22918.45 6414.609 3.572852 3.695046e-04
## monthNov 15879.00 6567.968 2.417643 1.579467e-02
## dayMon 9773.00 4462.056 2.190246 2.873066e-02
## dayTue 11842.00 4462.056 2.653934 8.078872e-03
## dayWed 8656.80 4233.078 2.045037 4.110569e-02
## dayThu 10536.80 4233.078 2.489158 1.296201e-02
## dayFri 9714.60 4233.078 2.294926 2.193831e-02
## wfh:monthAug -15506.65 5986.476 -2.590280 9.725614e-03
## wfh:monthSep -31605.30 9535.307 -3.314555 9.499741e-04
## wfh:monthOct -25825.75 8886.850 -2.906063 3.738654e-03
## year2021:monthMar 28369.95 8105.727 3.499988 4.852487e-04
## year2022:monthMar 20865.75 7055.130 2.957529 3.172163e-03
## year2023:monthMar 18277.15 8105.727 2.254844 2.435308e-02
## year2021:monthApr 15094.95 7466.448 2.021704 4.346583e-02
## monthAug:dayMon -12482.80 5986.476 -2.085167 3.730067e-02
## monthSep:dayMon -20299.33 6815.903 -2.978231 2.967278e-03
## monthApr:dayTue -18479.00 8698.148 -2.124475 3.386880e-02
## monthAug:dayTue -12773.25 6150.520 -2.076776 3.807044e-02
## monthSep:dayTue -17232.33 6815.903 -2.528254 1.161207e-02
## monthApr:dayWed -20131.70 8582.936 -2.345549 1.918856e-02
## monthSep:dayWed -15440.47 6668.249 -2.315521 2.078075e-02
## monthAug:dayThu -13803.05 5986.476 -2.305705 2.132561e-02
## monthSep:dayThu -17804.13 6668.249 -2.669986 7.705197e-03
## monthApr:dayFri -17669.55 8698.148 -2.031415 4.246999e-02
## monthAug:dayFri -14283.85 5986.476 -2.386020 1.721220e-02
## monthSep:dayFri -15875.93 6668.249 -2.380825 1.745545e-02
## wfh:monthSep:dayMon 30802.73 13484.960 2.284229 2.256144e-02
## wfh:monthSep:dayTue 32342.58 12725.337 2.541590 1.118104e-02
## wfh:monthSep:dayThu 27267.03 13336.497 2.044542 4.115455e-02
## year2022:monthApr:dayTue 19739.30 8811.854 2.240085 2.529868e-02
## year2022:monthJun:dayTue 19008.70 8698.148 2.185373 2.908682e-02
## year2022:monthApr:dayWed 18329.00 8698.148 2.107230 3.533960e-02
## year2021:monthJun:dayWed 25868.05 10559.151 2.449823 1.445873e-02
## year2022:monthJun:dayWed 22260.90 8698.148 2.559269 1.063158e-02
## year2022:monthJun:dayFri 18693.10 8811.854 2.121358 3.413069e-02
## year2022:monthNov:dayFri 17447.25 8698.148 2.005858 4.513330e-02
##
## $rule_of_6
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15064.00 3206.120 4.698514 2.976735e-06
## monthApr 19324.50 4534.139 4.262000 2.212390e-05
## monthMay 29109.00 4301.462 6.767234 2.202309e-11
## monthJun 31047.75 4534.139 6.847552 1.291149e-11
## monthJul 26207.00 4534.139 5.779929 9.905247e-09
## monthAug 23736.00 4301.462 5.518124 4.337319e-08
## monthSep 28488.50 5553.163 5.130139 3.460221e-07
## monthOct 14437.00 5553.163 2.599780 9.462232e-03
## monthNov 13346.40 4301.462 3.102759 1.969644e-03
## dayMon 9773.00 4534.139 2.155426 3.136033e-02
## dayTue 11842.00 4534.139 2.611742 9.139636e-03
## dayWed 8656.80 4301.462 2.012525 4.442527e-02
## dayThu 10536.80 4301.462 2.449586 1.446819e-02
## dayFri 9714.60 4301.462 2.258442 2.412729e-02
## rule_of_6_indoors:year2021 29452.67 9794.859 3.006952 2.703035e-03
## rule_of_6_indoors:monthJun -30333.27 9689.346 -3.130579 1.793894e-03
## year2021:monthMar 13120.85 6083.185 2.156904 3.124460e-02
## year2023:monthApr -13439.50 6083.185 -2.209287 2.737480e-02
## year2021:monthMay -16665.93 6358.581 -2.621015 8.896372e-03
## year2022:monthMay -15645.40 5911.794 -2.646473 8.258014e-03
## year2023:monthMay -16880.40 6083.185 -2.774928 5.621594e-03
## year2022:monthJun -19145.55 6249.879 -3.063347 2.245802e-03
## year2023:monthJun -19684.65 6249.879 -3.149605 1.682142e-03
## year2021:monthJul -22392.60 8360.539 -2.678368 7.516322e-03
## year2023:monthJul -17511.80 6083.185 -2.878722 4.075717e-03
## year2023:monthAug -15506.65 6083.185 -2.549100 1.094460e-02
## year2022:monthSep -31741.05 7024.258 -4.518777 6.941925e-06
## year2023:monthSep -19097.40 7024.258 -2.718778 6.662778e-03
## year2022:monthOct -15913.80 6876.362 -2.314276 2.084915e-02
## year2022:monthNov -20327.45 6083.185 -3.341580 8.630496e-04
## year2022:monthDec -13319.30 6249.879 -2.131129 3.331546e-02
## year2021:dayMon -12834.60 6249.879 -2.053576 4.026956e-02
## year2021:dayTue -15730.10 6249.879 -2.516865 1.199181e-02
## year2021:dayWed -13123.90 6083.185 -2.157406 3.120543e-02
## year2021:dayThu -13619.90 6083.185 -2.238942 2.537319e-02
## monthApr:dayMon -28352.25 6412.241 -4.421582 1.084084e-05
## monthMay:dayMon -21863.50 6249.879 -3.498228 4.884252e-04
## monthJun:dayMon -19731.35 6249.879 -3.157077 1.640034e-03
## monthJul:dayMon -18006.00 6412.241 -2.808067 5.078414e-03
## monthAug:dayMon -12482.80 6083.185 -2.052017 4.042111e-02
## monthSep:dayMon -20371.50 9068.278 -2.246457 2.488655e-02
## monthNov:dayMon -16653.60 6083.185 -2.737645 6.295020e-03
## monthApr:dayTue -32976.75 6412.241 -5.142781 3.240677e-07
## monthMay:dayTue -27881.50 6249.879 -4.461126 9.052109e-06
## monthJun:dayTue -18558.15 6249.879 -2.969362 3.053530e-03
## monthJul:dayTue -16647.75 6412.241 -2.596245 9.559466e-03
## monthAug:dayTue -12773.25 6249.879 -2.043760 4.123201e-02
## monthSep:dayTue -17806.00 7853.359 -2.267310 2.357844e-02
## monthNov:dayTue -15307.40 6249.879 -2.449231 1.448235e-02
## monthApr:dayWed -28340.10 6083.185 -4.658760 3.598865e-06
## monthMay:dayWed -22482.30 6083.185 -3.695810 2.307640e-04
## monthJun:dayWed -23639.30 6249.879 -3.782361 1.643163e-04
## monthJul:dayWed -15798.20 6083.185 -2.597028 9.537872e-03
## monthSep:dayWed -16592.30 7721.360 -2.148883 3.187687e-02
## monthOct:dayWed -15363.30 7721.360 -1.989714 4.688728e-02
## monthApr:dayThu -29526.30 6083.185 -4.853756 1.399410e-06
## monthMay:dayThu -23530.05 6083.185 -3.868047 1.166161e-04
## monthJun:dayThu -24019.30 6249.879 -3.843162 1.289160e-04
## monthJul:dayThu -16849.20 6083.185 -2.769799 5.710200e-03
## monthAug:dayThu -13803.05 6083.185 -2.269050 2.347207e-02
## monthSep:dayThu -18536.80 7721.360 -2.400717 1.654004e-02
## monthNov:dayThu -16503.20 6083.185 -2.712921 6.780836e-03
## monthApr:dayFri -25850.60 6249.879 -4.136176 3.820034e-05
## monthMay:dayFri -20046.60 5911.794 -3.390951 7.230303e-04
## monthJun:dayFri -20042.35 6249.879 -3.206838 1.383480e-03
## monthAug:dayFri -14283.85 6083.185 -2.348087 1.905903e-02
## monthSep:dayFri -16092.10 7721.360 -2.084102 3.739761e-02
## monthNov:dayFri -15748.00 6083.185 -2.588775 9.767933e-03
## rule_of_6_indoors:year2021:dayMon -32115.67 13348.143 -2.406003 1.630402e-02
## rule_of_6_indoors:year2021:dayTue -34968.17 13348.143 -2.619703 8.930435e-03
## rule_of_6_indoors:year2021:dayWed -34230.67 13348.143 -2.564452 1.047512e-02
## rule_of_6_indoors:year2021:dayThu -38310.33 12824.481 -2.987281 2.881567e-03
## rule_of_6_indoors:monthMay:dayMon 30735.83 12824.481 2.396653 1.672354e-02
## rule_of_6_indoors:monthJun:dayMon 47893.62 13193.227 3.630167 2.972073e-04
## rule_of_6_indoors:monthJun:dayTue 53764.77 13115.083 4.099461 4.467966e-05
## rule_of_6_indoors:monthJun:dayWed 51143.62 13115.083 3.899603 1.026092e-04
## rule_of_6_indoors:monthJun:dayThu 56432.40 12930.910 4.364148 1.405122e-05
## rule_of_6_indoors:monthJun:dayFri 31180.53 12851.171 2.426280 1.542584e-02
## year2021:monthApr:dayMon 23106.60 8954.207 2.580530 1.000273e-02
## year2022:monthApr:dayMon 24330.15 8838.664 2.752696 6.014864e-03
## year2023:monthApr:dayMon 22528.95 8721.590 2.583124 9.928322e-03
## year2021:monthMay:dayMon 20981.93 9582.671 2.189570 2.877981e-02
## year2022:monthMay:dayMon 19855.10 8482.597 2.340687 1.943892e-02
## year2022:monthJun:dayMon 25240.00 8721.590 2.893968 3.884510e-03
## year2023:monthJun:dayMon 18180.95 8721.590 2.084591 3.735302e-02
## year2021:monthJul:dayMon 39076.10 11013.385 3.548055 4.056289e-04
## year2022:monthJul:dayMon 17515.15 8721.590 2.008252 4.487798e-02
## year2022:monthSep:dayMon 25829.15 10919.652 2.365382 1.819649e-02
## year2021:monthNov:dayMon 23339.05 8602.923 2.712921 6.780834e-03
## year2022:monthNov:dayMon 23185.50 8602.923 2.695072 7.152254e-03
## year2021:monthDec:dayMon 19586.85 8838.664 2.216042 2.690722e-02
## year2021:monthApr:dayTue 30099.60 8954.207 3.361504 8.037591e-04
## year2022:monthApr:dayTue 34237.05 8954.207 3.823572 1.394500e-04
## year2023:monthApr:dayTue 31581.45 8721.590 3.621066 3.077241e-04
## year2021:monthMay:dayTue 30654.93 9582.671 3.198997 1.421280e-03
## year2022:monthMay:dayTue 28903.20 8602.923 3.359695 8.089811e-04
## year2023:monthMay:dayTue 24224.90 8602.923 2.815892 4.957300e-03
## year2022:monthJun:dayTue 33506.45 8838.664 3.790895 1.588466e-04
## year2023:monthJun:dayTue 18172.00 8721.590 2.083565 3.744656e-02
## year2021:monthJul:dayTue 39778.85 11013.385 3.611864 3.187107e-04
## year2022:monthJul:dayTue 18737.30 8838.664 2.119925 3.425173e-02
## year2023:monthJul:dayTue 17121.25 8721.590 1.963088 4.990550e-02
## year2021:monthAug:dayTue 20266.35 8602.923 2.355752 1.867247e-02
## year2022:monthSep:dayTue 26528.80 10036.744 2.643168 8.338493e-03
## year2021:monthOct:dayTue 21629.90 9933.800 2.177404 2.967747e-02
## year2022:monthOct:dayTue 25097.30 9933.800 2.526455 1.167133e-02
## year2021:monthNov:dayTue 26574.35 8721.590 3.046962 2.370738e-03
## year2022:monthNov:dayTue 24890.75 8721.590 2.853923 4.405076e-03
## year2021:monthDec:dayTue 17859.55 8838.664 2.020617 4.357862e-02
## year2021:monthApr:dayWed 23372.95 8721.590 2.679896 7.482353e-03
## year2022:monthApr:dayWed 26537.40 8721.590 3.042725 2.404061e-03
## year2023:monthApr:dayWed 26272.85 8602.923 3.053944 2.316737e-03
## year2021:monthMay:dayWed 26631.73 9474.795 2.810798 5.035842e-03
## year2022:monthMay:dayWed 20185.45 8602.923 2.346348 1.914773e-02
## year2023:monthMay:dayWed 19319.55 8602.923 2.245696 2.493549e-02
## year2022:monthJun:dayWed 30469.30 8721.590 3.493549 4.969617e-04
## year2023:monthJun:dayWed 25054.45 8838.664 2.834642 4.677649e-03
## year2021:monthJul:dayWed 36792.30 10825.108 3.398793 7.028396e-04
## year2021:monthAug:dayWed 23309.90 8602.923 2.709532 6.849980e-03
## year2021:monthSep:dayWed 22702.85 9829.779 2.309599 2.110798e-02
## year2022:monthSep:dayWed 25928.10 9933.800 2.610089 9.183624e-03
## year2021:monthOct:dayWed 25122.20 9829.779 2.555724 1.073978e-02
## year2022:monthOct:dayWed 20911.85 9829.779 2.127398 3.362480e-02
## year2021:monthNov:dayWed 21709.55 8721.590 2.489173 1.296145e-02
## year2022:monthNov:dayWed 17379.95 8602.923 2.020238 4.361795e-02
## year2021:monthDec:dayWed 19020.25 8602.923 2.210905 2.726211e-02
## year2022:monthMar:dayThu 17375.45 8602.923 2.019715 4.367231e-02
## year2021:monthApr:dayThu 26236.45 8602.923 3.049713 2.349321e-03
## year2022:monthApr:dayThu 29811.10 8721.590 3.418081 6.553920e-04
## year2023:monthApr:dayThu 24640.80 8602.923 2.864236 4.265287e-03
## year2021:monthMay:dayThu 24994.48 9474.795 2.637997 8.465824e-03
## year2022:monthMay:dayThu 26721.70 8602.923 3.106119 1.947612e-03
## year2023:monthMay:dayThu 20261.95 8721.590 2.323195 2.036326e-02
## year2022:monthJun:dayThu 30472.90 8721.590 3.493962 4.962031e-04
## year2023:monthJun:dayThu 22893.35 8721.590 2.624906 8.796036e-03
## year2021:monthJul:dayThu 45121.30 10825.108 4.168208 3.328675e-05
## year2022:monthJul:dayThu 18554.25 8602.923 2.156738 3.125762e-02
## year2023:monthJul:dayThu 17216.75 8602.923 2.001267 4.562633e-02
## year2021:monthAug:dayThu 19868.15 8602.923 2.309465 2.111545e-02
## year2021:monthSep:dayThu 21123.35 9829.779 2.148914 3.187441e-02
## year2022:monthSep:dayThu 27357.95 9829.779 2.783171 5.481793e-03
## year2021:monthOct:dayThu 21539.95 9829.779 2.191296 2.865445e-02
## year2022:monthOct:dayThu 19646.85 9829.779 1.998707 4.590325e-02
## year2021:monthNov:dayThu 27693.80 8721.590 3.175315 1.541301e-03
## year2022:monthNov:dayThu 27214.50 8721.590 3.120360 1.856705e-03
## year2021:monthDec:dayThu 20720.25 8602.923 2.408513 1.619297e-02
## year2021:monthApr:dayFri 24500.05 8602.923 2.847875 4.488989e-03
## year2022:monthApr:dayFri 23176.85 8721.590 2.657411 7.996573e-03
## year2023:monthApr:dayFri 20370.85 8721.590 2.335681 1.969962e-02
## year2021:monthMay:dayFri 25442.03 9255.272 2.748923 6.084002e-03
## year2022:monthMay:dayFri 21365.00 8482.597 2.518686 1.193035e-02
## year2022:monthJun:dayFri 26874.15 8838.664 3.040522 2.421559e-03
## year2023:monthJun:dayFri 17485.20 8721.590 2.004818 4.524461e-02
## year2021:monthJul:dayFri 27986.00 10729.731 2.608267 9.232322e-03
## year2021:monthAug:dayFri 17570.20 8482.597 2.071323 3.857787e-02
## year2022:monthAug:dayFri 19942.15 8721.590 2.286527 2.242629e-02
## year2022:monthSep:dayFri 21822.50 9829.779 2.220040 2.663377e-02
## year2021:monthNov:dayFri 23079.60 8602.923 2.682762 7.418967e-03
## year2022:monthNov:dayFri 25628.30 8721.590 2.938489 3.371907e-03
##
## $eat_out
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15064.00 3210.781 4.691693 3.066454e-06
## monthApr 19324.50 4540.731 4.255813 2.268708e-05
## monthMay 29109.00 4307.715 6.757410 2.322287e-11
## monthJun 31047.75 4540.731 6.837611 1.362723e-11
## monthJul 26207.00 4540.731 5.771538 1.032751e-08
## monthAug 29033.00 7179.526 4.043861 5.641609e-05
## monthSep 25344.50 4540.731 5.581591 3.031570e-08
## monthOct 10412.00 4540.731 2.293023 2.204326e-02
## monthNov 13346.40 4307.715 3.098255 1.998279e-03
## dayMon 9773.00 4540.731 2.152297 3.160093e-02
## dayTue 11842.00 4540.731 2.607950 9.237654e-03
## dayWed 8656.80 4307.715 2.009604 4.472816e-02
## dayThu 10536.80 4307.715 2.446030 1.460686e-02
## dayFri 9714.60 4307.715 2.255163 2.432799e-02
## year2021:monthMar 13120.85 6092.029 2.153773 3.148460e-02
## year2023:monthApr -13439.50 6092.029 -2.206079 2.759399e-02
## year2021:monthMay -13702.80 5920.388 -2.314510 2.083159e-02
## year2022:monthMay -15645.40 5920.388 -2.642631 8.348660e-03
## year2023:monthMay -16880.40 6092.029 -2.770899 5.688676e-03
## year2022:monthJun -19145.55 6258.965 -3.058900 2.277721e-03
## year2023:monthJun -19684.65 6258.965 -3.145033 1.707255e-03
## year2023:monthJul -17511.80 6092.029 -2.874543 4.127611e-03
## year2023:monthAug -20803.65 8372.694 -2.484702 1.312060e-02
## year2022:monthSep -28597.05 6258.965 -4.568974 5.481290e-06
## year2023:monthSep -15953.40 6258.965 -2.548888 1.094778e-02
## year2022:monthNov -20327.45 6092.029 -3.336729 8.773584e-04
## year2022:monthDec -13319.30 6258.965 -2.128035 3.356612e-02
## year2021:dayMon -12834.60 6258.965 -2.050595 4.055380e-02
## year2021:dayTue -15730.10 6258.965 -2.513211 1.211233e-02
## year2021:dayWed -13123.90 6092.029 -2.154274 3.144523e-02
## year2021:dayThu -13619.90 6092.029 -2.235692 2.558111e-02
## monthApr:dayMon -28352.25 6421.563 -4.415163 1.113503e-05
## monthMay:dayMon -21863.50 6258.965 -3.493149 4.972075e-04
## monthJun:dayMon -19731.35 6258.965 -3.152494 1.664621e-03
## monthJul:dayMon -18006.00 6421.563 -2.803990 5.140297e-03
## monthSep:dayMon -16054.75 6421.563 -2.500131 1.256605e-02
## monthNov:dayMon -16653.60 6092.029 -2.733670 6.368366e-03
## monthApr:dayTue -32976.75 6421.563 -5.135315 3.354582e-07
## monthMay:dayTue -27881.50 6258.965 -4.454650 9.301584e-06
## monthJun:dayTue -18558.15 6258.965 -2.965051 3.094600e-03
## monthJul:dayTue -16647.75 6421.563 -2.592476 9.660920e-03
## monthSep:dayTue -12837.30 6258.965 -2.051026 4.051169e-02
## monthNov:dayTue -15307.40 6258.965 -2.445676 1.462112e-02
## monthApr:dayWed -28340.10 6092.029 -4.651997 3.705762e-06
## monthMay:dayWed -22482.30 6092.029 -3.690445 2.353367e-04
## monthJun:dayWed -23639.30 6258.965 -3.776870 1.677081e-04
## monthJul:dayWed -15798.20 6092.029 -2.593257 9.639150e-03
## monthSep:dayWed -14561.30 6092.029 -2.390222 1.701336e-02
## monthApr:dayThu -29526.30 6092.029 -4.846710 1.444003e-06
## monthMay:dayThu -23530.05 6092.029 -3.862432 1.191203e-04
## monthJun:dayThu -24019.30 6258.965 -3.837583 1.316530e-04
## monthJul:dayThu -16849.20 6092.029 -2.765778 5.778117e-03
## monthSep:dayThu -14411.05 6258.965 -2.302465 2.150345e-02
## monthNov:dayThu -16503.20 6092.029 -2.708982 6.858589e-03
## monthApr:dayFri -25850.60 6258.965 -4.130171 3.912313e-05
## monthMay:dayFri -20046.60 5920.388 -3.386028 7.353333e-04
## monthJun:dayFri -20042.35 6258.965 -3.202183 1.404804e-03
## monthSep:dayFri -13300.85 6258.965 -2.125088 3.381187e-02
## monthNov:dayFri -15748.00 6092.029 -2.585017 9.871076e-03
## year2021:monthApr:dayMon 23106.60 8967.225 2.576784 1.010776e-02
## year2022:monthApr:dayMon 24330.15 8851.514 2.748699 6.085627e-03
## year2023:monthApr:dayMon 22528.95 8734.270 2.579374 1.003276e-02
## year2021:monthMay:dayMon 22942.70 8615.431 2.662978 7.863506e-03
## year2022:monthMay:dayMon 19855.10 8494.929 2.337288 1.961101e-02
## year2021:monthJun:dayMon 17956.45 8851.514 2.028630 4.274744e-02
## year2022:monthJun:dayMon 25240.00 8734.270 2.889766 3.934432e-03
## year2023:monthJun:dayMon 18180.95 8734.270 2.081565 3.762356e-02
## year2021:monthJul:dayMon 18756.85 8967.225 2.091712 3.670362e-02
## year2022:monthJul:dayMon 17515.15 8734.270 2.005336 4.518285e-02
## year2022:monthSep:dayMon 21512.40 8851.514 2.430364 1.525002e-02
## year2021:monthNov:dayMon 23339.05 8615.431 2.708982 6.858587e-03
## year2022:monthNov:dayMon 23185.50 8615.431 2.691160 7.233318e-03
## year2021:monthDec:dayMon 19586.85 8851.514 2.212825 2.712380e-02
## year2021:monthApr:dayTue 30099.60 8967.225 3.356624 8.172260e-04
## year2022:monthApr:dayTue 34237.05 8967.225 3.818021 1.423841e-04
## year2023:monthApr:dayTue 31581.45 8734.270 3.615809 3.136056e-04
## year2021:monthMay:dayTue 28709.55 8734.270 3.287001 1.046087e-03
## year2022:monthMay:dayTue 28903.20 8615.431 3.354818 8.225226e-04
## year2023:monthMay:dayTue 24224.90 8615.431 2.811804 5.018005e-03
## year2021:monthJun:dayTue 25650.10 8734.270 2.936719 3.389279e-03
## year2022:monthJun:dayTue 33506.45 8851.514 3.785392 1.621386e-04
## year2023:monthJun:dayTue 18172.00 8734.270 2.080540 3.771755e-02
## year2021:monthJul:dayTue 20370.85 8967.225 2.271701 2.330586e-02
## year2022:monthJul:dayTue 18737.30 8851.514 2.116847 3.450709e-02
## year2022:monthSep:dayTue 21560.10 8851.514 2.435753 1.502601e-02
## year2021:monthOct:dayTue 18203.15 8851.514 2.056501 3.998043e-02
## year2022:monthOct:dayTue 21670.55 8851.514 2.448231 1.451843e-02
## year2021:monthNov:dayTue 26574.35 8734.270 3.042538 2.404114e-03
## year2022:monthNov:dayTue 24890.75 8734.270 2.849780 4.460312e-03
## year2021:monthDec:dayTue 17859.55 8851.514 2.017683 4.387779e-02
## year2021:monthApr:dayWed 23372.95 8734.270 2.676005 7.566319e-03
## year2022:monthApr:dayWed 26537.40 8734.270 3.038308 2.437823e-03
## year2023:monthApr:dayWed 26272.85 8615.431 3.049511 2.349485e-03
## year2021:monthMay:dayWed 25537.10 8615.431 2.964112 3.103979e-03
## year2022:monthMay:dayWed 20185.45 8615.431 2.342941 1.931795e-02
## year2023:monthMay:dayWed 19319.55 8615.431 2.242436 2.514089e-02
## year2021:monthJun:dayWed 34076.45 8734.270 3.901465 1.016834e-04
## year2022:monthJun:dayWed 30469.30 8734.270 3.488477 5.058763e-04
## year2023:monthJun:dayWed 25054.45 8851.514 2.830527 4.735604e-03
## year2021:monthJul:dayWed 22908.05 8734.270 2.622778 8.847697e-03
## year2021:monthSep:dayWed 20671.85 8615.431 2.399398 1.659517e-02
## year2022:monthSep:dayWed 23897.10 8734.270 2.736016 6.323476e-03
## year2021:monthNov:dayWed 21709.55 8734.270 2.485560 1.308921e-02
## year2022:monthNov:dayWed 17379.95 8615.431 2.017305 4.391730e-02
## year2021:monthDec:dayWed 19020.25 8615.431 2.207696 2.748068e-02
## year2022:monthMar:dayThu 17375.45 8615.431 2.016783 4.397190e-02
## year2021:monthApr:dayThu 26236.45 8615.431 3.045286 2.382449e-03
## year2022:monthApr:dayThu 29811.10 8734.270 3.413119 6.667026e-04
## year2023:monthApr:dayThu 24640.80 8615.431 2.860078 4.319112e-03
## year2021:monthMay:dayThu 25310.35 8615.431 2.937793 3.377664e-03
## year2022:monthMay:dayThu 26721.70 8615.431 3.101609 1.975981e-03
## year2023:monthMay:dayThu 20261.95 8734.270 2.319822 2.054119e-02
## year2021:monthJun:dayThu 30431.40 8851.514 3.437988 6.089962e-04
## year2022:monthJun:dayThu 30472.90 8734.270 3.488889 5.051059e-04
## year2023:monthJun:dayThu 22893.35 8734.270 2.621095 8.891207e-03
## year2021:monthJul:dayThu 26310.30 8615.431 3.053858 2.316007e-03
## year2022:monthJul:dayThu 18554.25 8615.431 2.153607 3.149769e-02
## year2023:monthJul:dayThu 17216.75 8615.431 1.998362 4.593444e-02
## year2022:monthSep:dayThu 23232.20 8734.270 2.659890 7.935455e-03
## year2021:monthNov:dayThu 27693.80 8734.270 3.170706 1.564645e-03
## year2022:monthNov:dayThu 27214.50 8734.270 3.115830 1.883970e-03
## year2021:monthDec:dayThu 20720.25 8615.431 2.405016 1.634365e-02
## year2021:monthApr:dayFri 24500.05 8615.431 2.843741 4.545066e-03
## year2022:monthApr:dayFri 23176.85 8734.270 2.653553 8.084988e-03
## year2023:monthApr:dayFri 20370.85 8734.270 2.332290 1.987336e-02
## year2021:monthMay:dayFri 20349.90 8372.694 2.430508 1.524399e-02
## year2022:monthMay:dayFri 21365.00 8494.929 2.515030 1.205040e-02
## year2022:monthJun:dayFri 26874.15 8851.514 3.036108 2.455523e-03
## year2023:monthJun:dayFri 17485.20 8734.270 2.001907 4.555107e-02
## year2022:monthSep:dayFri 19031.25 8734.270 2.178917 2.955917e-02
## year2021:monthNov:dayFri 23079.60 8615.431 2.678868 7.502379e-03
## year2022:monthNov:dayFri 25628.30 8734.270 2.934224 3.416428e-03
#I use filter in order to give only the significant results
# Plot diagnostic plots for each interaction model to check for any violations of regression assumptions
par(mfrow = c(2, 2))
plot(interaction_model_wfh)
## Warning: not plotting observations with leverage one:
## 268, 269, 270, 271, 272, 572, 707
plot(interaction_model_rule_of_6)
## Warning: not plotting observations with leverage one:
## 251, 572
plot(interaction_model_eat_out)
## Warning: not plotting observations with leverage one:
## 214, 215, 244
# Compare models including time variables with interaction models
anova(interaction_model_wfh, multiple_model_time)
## Analysis of Variance Table
##
## Model 1: Hires ~ wfh * year * month * day
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year +
## month + day
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1027 4.0895e+10
## 2 1346 7.6441e+10 -319 -3.5546e+10 2.7983 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(interaction_model_rule_of_6, multiple_model_time)
## Analysis of Variance Table
##
## Model 1: Hires ~ rule_of_6_indoors * year * month * day
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year +
## month + day
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1027 4.2227e+10
## 2 1346 7.6441e+10 -319 -3.4214e+10 2.6085 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(interaction_model_eat_out, multiple_model_time)
## Analysis of Variance Table
##
## Model 1: Hires ~ eat_out_to_help_out * year * month * day
## Model 2: Hires ~ wfh + rule_of_6_indoors + eat_out_to_help_out + year +
## month + day
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1052 4.3381e+10
## 2 1346 7.6441e+10 -294 -3.306e+10 2.7269 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check for multicollinearity in the interaction models
vif(interaction_model_wfh, type = "predictor")
## Warning in cor(X): the standard deviation is zero
## GVIFs computed for predictors
## GVIF Df GVIF^(1/(2*Df)) Interacts With Other Predictors
## wfh 1 671 1 year, month, day --
## year 1 671 1 wfh, month, day --
## month 1 671 1 wfh, year, day --
## day 1 671 1 wfh, year, month --
vif(interaction_model_rule_of_6, type = "predictor")
## Warning in cor(X): the standard deviation is zero
## GVIFs computed for predictors
## GVIF Df GVIF^(1/(2*Df)) Interacts With
## rule_of_6_indoors 1 671 1 year, month, day
## year 1 671 1 rule_of_6_indoors, month, day
## month 1 671 1 rule_of_6_indoors, year, day
## day 1 671 1 rule_of_6_indoors, year, month
## Other Predictors
## rule_of_6_indoors --
## year --
## month --
## day --
vif(interaction_model_eat_out, type = "predictor")
## Warning in cor(X): the standard deviation is zero
## GVIFs computed for predictors
## GVIF Df GVIF^(1/(2*Df)) Interacts With
## eat_out_to_help_out 1 671 1 year, month, day
## year 1 671 1 eat_out_to_help_out, month, day
## month 1 671 1 eat_out_to_help_out, year, day
## day 1 671 1 eat_out_to_help_out, year, month
## Other Predictors
## eat_out_to_help_out --
## year --
## month --
## day --
# Estimate marginal means for the interaction models
emm_wfh <- emmeans(interaction_model_wfh, ~ wfh | year | month | day)
emm_rule_of_6 <- emmeans(interaction_model_rule_of_6, ~ rule_of_6_indoors | year | month | day)
emm_eat_out <- emmeans(interaction_model_eat_out, ~ eat_out_to_help_out | year | month | day)
process_emmeans <- function(emm) {
emm_summary <- summary(emm)
emm_filtered <- emm_summary %>%
group_by(year) %>%
filter(!(is.na(lower.CL) & is.na(upper.CL))) %>%
ungroup()
return(emm_filtered)
}
# Apply the filter to each emmeans result
filtered_emm_wfh <- process_emmeans(emm_wfh)
filtered_emm_rule_of_6 <- process_emmeans(emm_rule_of_6)
filtered_emm_eat_out <- process_emmeans(emm_eat_out)
# Combine and display the filtered results
filtered_results <- list(WFH = filtered_emm_wfh, Rule_of_6 = filtered_emm_rule_of_6, Eat_Out = filtered_emm_eat_out)
filtered_results
## $WFH
## # A tibble: 343 × 9
## wfh year month day emmean SE df lower.CL upper.CL
## <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 2020 Jan Sun 15064. 3155. 1027 8873. 21255.
## 2 1 2021 Jan Sun 15138. 2822. 1027 9600. 20675.
## 3 1 2022 Jan Sun 22780. 2822. 1027 17242. 28317.
## 4 1 2023 Jan Sun 12607. 2822. 1027 7070. 18145.
## 5 0 2020 Feb Sun 10823. 3155. 1027 4632. 17015.
## 6 1 2021 Feb Sun 21346. 3155. 1027 15154. 27537.
## 7 1 2022 Feb Sun 19858. 3155. 1027 13667. 26049.
## 8 1 2023 Feb Sun 16806. 3155. 1027 10615. 22997.
## 9 0 2020 Mar Sun 15005. 3643. 1027 7856. 22154.
## 10 1 2020 Mar Sun 11789. 4462. 1027 3033. 20545.
## # ℹ 333 more rows
##
## $Rule_of_6
## # A tibble: 343 × 9
## rule_of_6_indoors year month day emmean SE df lower.CL upper.CL
## <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 2020 Jan Sun 15064. 3206. 1027 8773. 21355.
## 2 0 2021 Jan Sun 15138. 2868. 1027 9510. 20765.
## 3 0 2022 Jan Sun 22780. 2868. 1027 17153. 28407.
## 4 0 2023 Jan Sun 12607. 2868. 1027 6980. 18235.
## 5 0 2020 Feb Sun 10823. 3206. 1027 4532. 17115.
## 6 0 2021 Feb Sun 21346. 3206. 1027 15054. 27637.
## 7 0 2022 Feb Sun 19858. 3206. 1027 13567. 26149.
## 8 0 2023 Feb Sun 16806. 3206. 1027 10515. 23097.
## 9 0 2020 Mar Sun 13719. 2868. 1027 8092. 19346.
## 10 0 2021 Mar Sun 26913. 3206. 1027 20622. 33205.
## # ℹ 333 more rows
##
## $Eat_Out
## # A tibble: 318 × 9
## eat_out_to_help_out year month day emmean SE df lower.CL upper.CL
## <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 2020 Jan Sun 15064. 3211. 1052 8764. 21364.
## 2 0 2021 Jan Sun 15138. 2872. 1052 9502. 20773.
## 3 0 2022 Jan Sun 22780. 2872. 1052 17145. 28415.
## 4 0 2023 Jan Sun 12607. 2872. 1052 6972. 18243.
## 5 0 2020 Feb Sun 10823. 3211. 1052 4523. 17124.
## 6 0 2021 Feb Sun 21346. 3211. 1052 15045. 27646.
## 7 0 2022 Feb Sun 19858. 3211. 1052 13558. 26158.
## 8 0 2023 Feb Sun 16806. 3211. 1052 10506. 23106.
## 9 0 2020 Mar Sun 13719. 2872. 1052 8084. 19354.
## 10 0 2021 Mar Sun 26913. 3211. 1052 20613. 33214.
## # ℹ 308 more rows
filtered_emm_wfh$variable <- "WFH"
filtered_emm_rule_of_6$variable <- "Rule of 6 Indoors"
filtered_emm_eat_out$variable <- "Eat Out to Help Out"
# Standardize column names to use rbind
standardize_colnames <- function(df, standard_names) {
colnames(df) <- standard_names
return(df)
}
standard_names <- colnames(filtered_emm_wfh)
filtered_emm_wfh <- standardize_colnames(filtered_emm_wfh, standard_names)
filtered_emm_rule_of_6 <- standardize_colnames(filtered_emm_rule_of_6, standard_names)
filtered_emm_eat_out <- standardize_colnames(filtered_emm_eat_out, standard_names)
combined_filtered_emm <- rbind(filtered_emm_wfh, filtered_emm_rule_of_6, filtered_emm_eat_out)
#Plotting emmans
plot_emmeans <- function(emm_filtered, title) {
ggplot(emm_filtered, aes(x = year, y = emmean, group = interaction(year, month, day), color = factor(month))) +
geom_line() +
geom_point() +
facet_wrap(~ variable, scales = "free") +
labs(title = title, x = "Year", y = "Estimated Marginal Mean") +
theme_minimal() +
theme(legend.position = "bottom") +
guides(color = guide_legend(title = "Month"))
}
plot_combined_emm <- plot_emmeans(combined_filtered_emm, "EMMeans for Interaction Models")
print(plot_combined_emm)
This analysis focus on the effect of COVID-19 policy towards bike hires in London. on the data exploration stage, it was found that there was no NA or missing data. Also, by using histogram, it was found that the data is normally distributed.
I also check the correlation between variables that considered in the analysis of bike hiring trends during the COVID-19 pandemic in London. It was found that WFH shows a weak positive correlation with all the rest of variable. On he other hand, The rule of 6 indoors show negative correlations with the coefficients around -0.09 to -0.15. although these are weak and the negative sign suggests that as one set of restrictions was in place, the likelihood of the rule of 6 effect was slightly lower. The Eat Out to Help Out scheme shows very weak negative correlations with coefficients around -0.05 to -0.08 to other variables. For the time variables, The day of the week has no significant correlation with the restrictions. The month shows negative correlations with coefficients around -0.19 to -0.31, which might indicate a seasonal pattern to the restrictions, while the year shows negative correlations with the closures and a positive correlation with WFH, indicating a possible change in restrictions policies over the years.
ANALYSIS 1
Analysis 1 emphasize more on the impact of bike hire trends with the covid 19 policy. Simple regression used to see the impact on bike hires on each policy. It was found that WFH presents a significant deterrent to bike hires, with a coefficient of -4289.7 (p < 2e-16), indicating that an increase in working from home is associated with a decrease in bike hires. On the other hand, The ‘rule of six’ indoors policy, on the other hand, appears to encourage bike use, showing a positive effect with a coefficient of 7412 (p < 2e-16). Similarly, the ‘eat out to help out’ scheme is also positively correlated with bike hires with a coefficient of 7738.2 (p = 0.000102).
In this part I also want to compared when these variables (WFH, rule of six, and eat out to help out) were assesed together and using time factor as other predictor.The combined effects remain significant, with working from home having a coefficient of -4208.7 (p < 2e-16), the ‘rule of six’ showing a positive coefficient of 8054.2 (p < 2e-16), and the ‘eat out to help out’ resulting in a coefficient of 4883.4 (p = 0.0154). The variance inflation factors for this model without time variables are low, indicating that there is a minimal multi collinearity.
on the other hand, when the time include, the Multiple R-squared increase to 0.4799. it means that nearly 48% of the variability in bike hires can be explained when time factors are considered. Moreover, it was found that year2022 have a significant positive coefficient of 5291.9 (p < 2e-16), it explain that there was an increase in bike hires in this year compared to the baseline year. The month coefficients are also highly significant, with May and June showing the largest positive effects on bike hires, 14409.0 (p < 2e-16) and 18811.1 (p < 2e-16), respectively. I also found that Days of the week were less significant, though Tuesday to Saturday all had positive coefficients, with Saturday (daySat) having the highest at 3679.6 (p < 2e-16).
The ANOVA results comparing the model without time variables to the model with them yields a stark difference in the residual sum of squares (RSS), dropping from 1.3914e+11 to 7.6441e+10, with an F-statistic of 55.199 and p-value of less than 2.2e-16. It means that time variables play a crucial role in modeling bike hire patterns
ANALYSIS 2
This part focus more on the interaction between COVID-19 policies, time variables, and bike hires. The wfh interaction model shows a significant base effect with an intercept of 15064.00 (p = 2.06e-06). The wfh policy itself shows a positive coefficient of 13319.30 (p = 0.0305). In the interaction model, it was found that August (monthAug) has a negative coefficient of -15506.65 (p = 0.0097). it means that there was a decrease in bike hires during this month when more people work from home.
In the rule of six interaction model, April (monthApr) shows a significant positive effect of 19324.50 (p = 2.21e-05). However, the interaction between the rule of six indoors policy in 2021 (rule_of_6_indoors:year2021) and June (monthJun) mentioned that there is negative coefficient of -30333.27 (p = 0.0018). it means that there was a substantial reduction in bike hires under these combined conditions.
The eat out to help out interaction model presents a similar pattern with significant positive coefficients for months like May (29109.00, p < 2.2e-11) and June (31047.75, p < 2.2e-11). In the interaction term, April 2023 (year2023:monthApr) results in a negative coefficient of -13439.50 (p = 0.0276).
plot(interaction_model_wfh)
## Warning: not plotting observations with leverage one:
## 268, 269, 270, 271, 272, 572, 707
plot(interaction_model_rule_of_6)
## Warning: not plotting observations with leverage one:
## 251, 572
plot(interaction_model_eat_out)
## Warning: not plotting observations with leverage one:
## 214, 215, 244
Next part is diagonistic plot. It shows the interaction models when assessing the impact of work from home (WFH), rule of 6, and eat out policy on bike hires provide insights into the validity of regression assumptions. overall, these plots mentions that while the models capture a significant amount of variance, there may be concerns regarding outliers, leverage points, and assumptions of linearity and equal variance that could affect the model’s reliability.
In the VIF analysis on the interaction models reveals that multicollinearity is not a concern for any of the predictors. All the predictors have a VIF value of 1. It mentions that there was no inflation and therefore no multicollinearity. Moreover, The degrees of freedom for all predictors is 671.
plot_combined_emm
for the emmeans analysis for interaction, i use plot to make easier to read and interpret. The variation in dots across months indicates changes in the response variable’s means due to these three COVID-19 policy’s interactions with time. There’s a notable spread in the means for each month, suggesting a potential seasonal pattern or policy effect variation over time. Moreover the plot is overlapping, it explains that the estimated means for different months are close to each other within the same year, yet there appears to be a discernible trend or shift in the means from year to year.
In conclusion, The inclusion of time variables significantly improves model fit. It means that the importance of considering temporal factors. Interaction models further reveal the complex interaction between COVID- 19 policies and time, with significant variations observed across different months and years, underscoring the nuanced impact of these policies on bike hiring behaviors.
# Load the publisher sales data
book_sales <- read.csv("publisher_sales.csv", stringsAsFactors = TRUE)
| Variable | Description |
|---|---|
| sold.by | The book’s publisher name |
| publisher.type | The type of publisher |
| genre | The genre of the book |
| avg.review | The average of review scores given to the book |
| daily.sales | The number of book sold |
| total.reviews | The number of people who give their reviews to the book |
| sale.price | The price of the book |
# Data structure and summary
str(book_sales)
## 'data.frame': 6000 obs. of 7 variables:
## $ sold.by : Factor w/ 13 levels "Amazon Digital Services, Inc.",..: 1 6 1 1 1 1 11 13 1 13 ...
## $ publisher.type: Factor w/ 5 levels "amazon","big five",..: 3 2 5 5 5 5 2 2 5 2 ...
## $ genre : Factor w/ 3 levels "adult_fiction",..: 1 3 1 3 3 1 2 2 2 2 ...
## $ avg.review : num 4.5 4.64 2.5 4.5 4.98 3.98 4.62 3.5 4.64 4.56 ...
## $ daily.sales : num 84.4 113.1 70.8 149.4 135.7 ...
## $ total.reviews : int 151 184 125 225 194 123 130 110 129 119 ...
## $ sale.price : num 5.12 6.91 6.27 4.91 7.39 ...
summary(book_sales)
## sold.by publisher.type
## Amazon Digital Services, Inc. :4271 amazon : 92
## Random House LLC : 486 big five :1688
## Penguin Group (USA) LLC : 346 indie :1243
## HarperCollins Publishers : 274 single author: 789
## Simon and Schuster Digital Sales Inc: 241 small/medium :2188
## Macmillan : 172
## (Other) : 210
## genre avg.review daily.sales total.reviews
## adult_fiction:2000 Min. :0.000 Min. : -0.53 Min. : 0.0
## non_fiction :2000 1st Qu.:4.090 1st Qu.: 64.22 1st Qu.:104.0
## YA_fiction :2000 Median :4.410 Median : 81.59 Median :128.0
## Mean :4.269 Mean : 86.37 Mean :132.7
## 3rd Qu.:4.610 3rd Qu.:104.73 3rd Qu.:163.0
## Max. :4.980 Max. :209.34 Max. :243.0
##
## sale.price
## Min. : 1.17
## 1st Qu.: 7.34
## Median : 9.31
## Mean :10.30
## 3rd Qu.:13.58
## Max. :22.00
##
# Check for missing values
sum(is.na(book_sales))
## [1] 0
### Data Visualization
#### Histogram
```r
# Histograms for various variables
p1 <- ggplot(book_sales, aes(x = daily.sales)) +
geom_histogram(binwidth = 1, fill = "blue") +
labs(title = "Histogram of Daily Sales")
p2 <- ggplot(book_sales, aes(x = sale.price)) +
geom_histogram(binwidth = 1, fill = "darkgreen") +
labs(title = "Histogram of Sale Price")
p3 <- ggplot(book_sales, aes(x = avg.review)) +
geom_histogram(binwidth = 0.1, fill = "red") +
labs(title = "Histogram of Average Review Scores")
p4 <- ggplot(book_sales, aes(x = total.reviews)) +
geom_histogram(binwidth = 10, fill = "orange") +
labs(title = "Histogram of Total Number of Reviews")
# Combine plots into a grid
grid.arrange(p1, p2, p3, p4, ncol = 2)
# Boxplot for daily sales
daily_sales_boxplot <- ggplot(book_sales, aes(y = daily.sales)) +
geom_boxplot(fill = 'blue') +
labs(x = "", y = "Daily Sales",
title = "Boxplot of Daily Sales") +
theme_classic()
# Boxplot for sales price
sales_price_boxplot <- ggplot(book_sales, aes(y = sale.price)) +
geom_boxplot(fill = 'darkgreen') +
labs(x = "", y = "Sale Price",
title = "Boxplot of Sale Price") +
theme_classic()
# Boxplot for average review
avg_review_boxplot <- ggplot(book_sales, aes(y = avg.review)) +
geom_boxplot(fill = 'red') +
labs(x = "", y = "Average Review",
title = "Boxplot of Average Review Scores") +
theme_classic()
# Boxplot for total reviews
total_review_boxplot <- ggplot(book_sales, aes(y = total.reviews)) +
geom_boxplot(fill = 'orange') +
labs(x = "", y = "Total Reviews",
title = "Boxplot of Total Number of Reviews") +
theme_classic()
# Arrange the boxplots into a 2x2 grid
grid.arrange(daily_sales_boxplot, sales_price_boxplot,
avg_review_boxplot, total_review_boxplot,
ncol = 2, nrow = 2)
# Identify rows with avg.review or total.reviews equal to 0
zero_review_rows <- book_sales$avg.review == 0 & book_sales$total.reviews == 0
# Display rows with zero reviews
print(book_sales[zero_review_rows, ])
## sold.by publisher.type genre
## 476 Amazon Digital Services, Inc. indie non_fiction
## 599 Amazon Digital Services, Inc. single author YA_fiction
## 668 Penguin Group (USA) LLC big five non_fiction
## 756 Amazon Digital Services, Inc. indie YA_fiction
## 1137 Amazon Digital Services, Inc. indie adult_fiction
## 1397 Amazon Digital Services, Inc. indie non_fiction
## 1449 Random House LLC big five adult_fiction
## 1742 Amazon Digital Services, Inc. small/medium YA_fiction
## 2523 Amazon Digital Services, Inc. small/medium adult_fiction
## 2734 HarperCollins Publishers big five adult_fiction
## 2760 Amazon Digital Services, Inc. indie YA_fiction
## 2860 Amazon Digital Services, Inc. small/medium adult_fiction
## 2967 Amazon Digital Services, Inc. small/medium YA_fiction
## 3358 Amazon Digital Services, Inc. small/medium adult_fiction
## 3397 Amazon Digital Services, Inc. indie non_fiction
## 3839 Amazon Digital Services, Inc. small/medium YA_fiction
## 3945 Penguin Group (USA) LLC big five non_fiction
## 4152 Amazon Digital Services, Inc. indie non_fiction
## 4392 Amazon Digital Services, Inc. amazon adult_fiction
## 4509 Amazon Digital Services, Inc. indie YA_fiction
## 4543 Amazon Digital Services, Inc. indie YA_fiction
## 4779 HarperCollins Publishers big five non_fiction
## 5413 Simon and Schuster Digital Sales Inc big five YA_fiction
## avg.review daily.sales total.reviews sale.price
## 476 0 78.27 0 11.66
## 599 0 125.48 0 7.93
## 668 0 58.06 0 16.38
## 756 0 143.41 0 5.15
## 1137 0 68.63 0 9.92
## 1397 0 65.97 0 14.47
## 1449 0 114.20 0 9.89
## 1742 0 93.56 0 9.48
## 2523 0 63.08 0 7.83
## 2734 0 63.67 0 12.10
## 2760 0 163.70 0 7.25
## 2860 0 88.89 0 5.57
## 2967 0 103.08 0 8.34
## 3358 0 77.08 0 4.96
## 3397 0 74.70 0 14.35
## 3839 0 86.03 0 11.54
## 3945 0 78.21 0 18.39
## 4152 0 64.83 0 11.92
## 4392 0 81.49 0 5.11
## 4509 0 155.53 0 9.27
## 4543 0 92.15 0 8.10
## 4779 0 65.53 0 14.95
## 5413 0 103.13 0 8.82
rcorr(as.matrix(select(book_sales, daily.sales, total.reviews, avg.review)), type = "spearman")
## daily.sales total.reviews avg.review
## daily.sales 1.00 0.68 -0.01
## total.reviews 0.68 1.00 0.02
## avg.review -0.01 0.02 1.00
##
## n= 6000
##
##
## P
## daily.sales total.reviews avg.review
## daily.sales 0.0000 0.5597
## total.reviews 0.0000 0.1777
## avg.review 0.5597 0.1777
# Linear regression to predict sales based on reviews and price
lm_sales_reviews <- lm(daily.sales ~ avg.review + total.reviews, data = book_sales)
summary(lm_sales_reviews)
##
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.573 -14.529 -0.909 13.669 129.721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.978718 2.357956 14.410 <2e-16 ***
## avg.review -4.306466 0.514881 -8.364 <2e-16 ***
## total.reviews 0.533428 0.007749 68.836 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.59 on 5997 degrees of freedom
## Multiple R-squared: 0.4416, Adjusted R-squared: 0.4414
## F-statistic: 2371 on 2 and 5997 DF, p-value: < 2.2e-16
# Regression model for daily sales by total reviews
m.sales.by.total.review <- lm(daily.sales ~ total.reviews, data = book_sales)
summary(m.sales.by.total.review)
##
## Call:
## lm(formula = daily.sales ~ total.reviews, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102.667 -14.694 -1.125 13.458 147.319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.381371 1.070717 15.30 <2e-16 ***
## total.reviews 0.527490 0.007761 67.97 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.72 on 5998 degrees of freedom
## Multiple R-squared: 0.4351, Adjusted R-squared: 0.435
## F-statistic: 4620 on 1 and 5998 DF, p-value: < 2.2e-16
cbind(coef(m.sales.by.total.review), confint(m.sales.by.total.review))
## 2.5 % 97.5 %
## (Intercept) 16.3813714 14.282382 18.4803612
## total.reviews 0.5274902 0.512276 0.5427044
m.sales.by.total.review.emm <- emmeans(m.sales.by.total.review, ~ total.reviews)
# Regression model for daily sales by average review
m.sales.by.avg.review <- lm(daily.sales ~ avg.review, data = book_sales)
summary(m.sales.by.avg.review)
##
## Call:
## lm(formula = daily.sales ~ avg.review, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86.721 -22.129 -4.722 18.267 123.403
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.8944 2.9543 30.767 <2e-16 ***
## avg.review -1.0593 0.6859 -1.544 0.123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.22 on 5998 degrees of freedom
## Multiple R-squared: 0.0003974, Adjusted R-squared: 0.0002308
## F-statistic: 2.385 on 1 and 5998 DF, p-value: 0.1226
cbind(coef(m.sales.by.avg.review), confint(m.sales.by.avg.review))
## 2.5 % 97.5 %
## (Intercept) 90.894430 85.102983 96.6858777
## avg.review -1.059283 -2.403959 0.2853935
m.sales.by.avg.review.emm <- emmeans(m.sales.by.avg.review, ~ avg.review)
# Multiple regression for daily sales using total reviews and average reviews
m.sales.by.total.avg.review <- lm(daily.sales ~ total.reviews + avg.review, data = book_sales)
summary(m.sales.by.total.avg.review)
##
## Call:
## lm(formula = daily.sales ~ total.reviews + avg.review, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.573 -14.529 -0.909 13.669 129.721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.978718 2.357956 14.410 <2e-16 ***
## total.reviews 0.533428 0.007749 68.836 <2e-16 ***
## avg.review -4.306466 0.514881 -8.364 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.59 on 5997 degrees of freedom
## Multiple R-squared: 0.4416, Adjusted R-squared: 0.4414
## F-statistic: 2371 on 2 and 5997 DF, p-value: < 2.2e-16
cbind(coef(m.sales.by.total.avg.review), confint(m.sales.by.total.avg.review))
## 2.5 % 97.5 %
## (Intercept) 33.9787175 29.3562756 38.6011594
## total.reviews 0.5334285 0.5182371 0.5486198
## avg.review -4.3064660 -5.3158175 -3.2971145
m.sales.by.total.avg.review.emm <- emmeans(m.sales.by.total.avg.review, ~ total.reviews + avg.review)
# Interaction effect between total reviews and average reviews
m.sales.by.total.avg.itr.review <- lm(daily.sales ~ total.reviews * avg.review, data = book_sales)
summary(m.sales.by.total.avg.itr.review)
##
## Call:
## lm(formula = daily.sales ~ total.reviews * avg.review, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -104.318 -14.401 -0.873 13.622 94.395
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.107846 4.169256 18.255 < 2e-16 ***
## total.reviews 0.139838 0.033199 4.212 2.57e-05 ***
## avg.review -14.673445 0.991327 -14.802 < 2e-16 ***
## total.reviews:avg.review 0.095620 0.007848 12.184 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.32 on 5996 degrees of freedom
## Multiple R-squared: 0.4551, Adjusted R-squared: 0.4548
## F-statistic: 1669 on 3 and 5996 DF, p-value: < 2.2e-16
cbind(coef(m.sales.by.total.avg.itr.review), confint(m.sales.by.total.avg.itr.review))
## 2.5 % 97.5 %
## (Intercept) 76.10784576 67.93460422 84.2810873
## total.reviews 0.13983789 0.07475610 0.2049197
## avg.review -14.67344515 -16.61680183 -12.7300885
## total.reviews:avg.review 0.09561996 0.08023496 0.1110050
m.sales.by.total.avg.itr.review.emm <- emmeans(m.sales.by.total.avg.itr.review, ~ total.reviews + avg.review)
anova(m.sales.by.total.avg.review, m.sales.by.total.avg.itr.review)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ total.reviews + avg.review
## Model 2: daily.sales ~ total.reviews * avg.review
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5997 3059870
## 2 5996 2985945 1 73925 148.45 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA Comparison
anova_comparison_split <- anova(m.sales.by.total.review, m.sales.by.avg.review)
print(anova_comparison_split)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ total.reviews
## Model 2: daily.sales ~ avg.review
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5998 3095564
## 2 5998 5477558 0 -2381994
#GAK SIGNIFIKAN KALAU BERDUA DOESNT MATTER KALAUPUN
anova_comparison_combined <- anova(m.sales.by.total.avg.review, m.sales.by.total.review)
print(anova_comparison_combined)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ total.reviews + avg.review
## Model 2: daily.sales ~ total.reviews
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5997 3059870
## 2 5998 3095564 -1 -35694 69.957 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_interaction_comparison <- anova(m.sales.by.total.avg.itr.review, m.sales.by.total.avg.review)
print(anova_interaction_comparison)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ total.reviews * avg.review
## Model 2: daily.sales ~ total.reviews + avg.review
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5996 2985945
## 2 5997 3059870 -1 -73925 148.45 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Linear regression model to study the effect of sale price on daily sales across genres
lm_sales_price_by_genre <- lm(daily.sales ~ sale.price * genre, data = book_sales)
summary(lm_sales_price_by_genre)
##
## Call:
## lm(formula = daily.sales ~ sale.price * genre, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -115.35 -13.69 0.19 13.54 96.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 88.2127 2.0456 43.123 < 2e-16 ***
## sale.price -0.7104 0.2492 -2.851 0.00437 **
## genrenon_fiction -23.6300 4.1904 -5.639 1.79e-08 ***
## genreYA_fiction 52.9727 2.8720 18.444 < 2e-16 ***
## sale.price:genrenon_fiction 0.6383 0.3474 1.838 0.06617 .
## sale.price:genreYA_fiction -2.8284 0.3502 -8.077 7.99e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.91 on 5994 degrees of freedom
## Multiple R-squared: 0.4751, Adjusted R-squared: 0.4746
## F-statistic: 1085 on 5 and 5994 DF, p-value: < 2.2e-16
# Linear regression to predict sales based on sale price and interaction with genre
lm_sales_price_genre <- lm(daily.sales ~ sale.price * genre, data = book_sales)
summary(lm_sales_price_genre)
##
## Call:
## lm(formula = daily.sales ~ sale.price * genre, data = book_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -115.35 -13.69 0.19 13.54 96.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 88.2127 2.0456 43.123 < 2e-16 ***
## sale.price -0.7104 0.2492 -2.851 0.00437 **
## genrenon_fiction -23.6300 4.1904 -5.639 1.79e-08 ***
## genreYA_fiction 52.9727 2.8720 18.444 < 2e-16 ***
## sale.price:genrenon_fiction 0.6383 0.3474 1.838 0.06617 .
## sale.price:genreYA_fiction -2.8284 0.3502 -8.077 7.99e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.91 on 5994 degrees of freedom
## Multiple R-squared: 0.4751, Adjusted R-squared: 0.4746
## F-statistic: 1085 on 5 and 5994 DF, p-value: < 2.2e-16
# Scatter plot with regression line for reviews
scatter_1<- ggplot(book_sales, aes(x = total.reviews, y = daily.sales)) +
geom_point(aes(color = genre), alpha = 0.5) +
geom_smooth(method = "lm") +
labs(title = "Daily Sales vs Total Reviews by Genre")
# Scatter plot with regression line for sale price
scatter_2 <- ggplot(book_sales, aes(x = sale.price, y = daily.sales)) +
geom_point(aes(color = genre), alpha = 0.5) +
geom_smooth(method = "lm") +
labs(title = "Daily Sales vs Sale Price by Genre")
# ANOVA to compare models with and without the interaction term
lm_sales_price <- lm(daily.sales ~ sale.price, data = book_sales)
anova_sales_price_genre <- anova(lm_sales_price, lm_sales_price_by_genre)
print(anova_sales_price_genre)
## Analysis of Variance Table
##
## Model 1: daily.sales ~ sale.price
## Model 2: daily.sales ~ sale.price * genre
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 5998 4065261
## 2 5994 2876489 4 1188772 619.29 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# emmeans to estimate the effect of sale price within each genre
emm_genre_effect <- emmeans(lm_sales_price_by_genre, ~ sale.price | genre)
summary(emm_genre_effect)
## genre = adult_fiction:
## sale.price emmean SE df lower.CL upper.CL
## 10.3 80.9 0.759 5994 79.4 82.4
##
## genre = non_fiction:
## sale.price emmean SE df lower.CL upper.CL
## 10.3 63.8 1.233 5994 61.4 66.3
##
## genre = YA_fiction:
## sale.price emmean SE df lower.CL upper.CL
## 10.3 104.7 0.758 5994 103.3 106.2
##
## Confidence level used: 0.95
This analysis talks about e-book sales data from a publishing company offers comprehensive insights into the dynamics of book sales. On the Data Exploration stage, it was found that there was no missing values on the data. After that, I checked the data using histogram and box plot. It was found that the data had ‘0’ average review scores indicating these books hadn’t received any reviews yet.
I used spearman correlation test to see the correlation between each variables. It was found that there is a positive correlation of 0.68 was found between total reviews and daily sales. It indicates that books with more reviews tend to have higher daily sales. On the other hand, there is a negative correlation of -0.01 between average review scores and daily sales that indicates the impact of review quality on sales was minimal.
To see the interaction between variables I the regression analysis stage , I conduct several models to gain some insights:
Influence of Total Reviews on Sales This model focusing on total reviews (R² = 0.4351) established a significant positive impact on daily sales. Each additional review was associated with an approximate increase of 0.53 units in daily sales (t(5998) = 67.97, p < 2e-16, 95% CI [0.51, 0.54]).
Role of Average Review Scores in Sales This model evaluated the impact of average review scores on sales. However, it shows that relationship was statistically insignificant (p = 0.123), with a coefficient of -1.0593 (95% CI [-2.40, 0.29]). It indicates a slightly decrease in sales with higher review scores.
Combined Effects of Total and Average Reviews This model combined both total and average reviews into a single model (R² = 0.4416) revealed significant effects on sales. Total reviews turns out to be positive predictor (β = 0.5334, 95% CI [0.518, 0.549]), on the other hand average reviews had a negative influence (β = -4.3065, 95% CI [-5.31, -3.29]).
Interaction Between Total and Average Reviews: An interaction model (R² = 0.4551) further elucidated the complex dynamics between total and average reviews on sales. The significant interaction term (β = 0.0956, p < 2e-16, 95% CI [0.080, 0.111]) indicated that the impact of average reviews on sales not really depends on the number of total reviews.
For depeer understanding, genre also involved in this analysis. This publisher data divided their books into three genres. which are adult fiction, non fiction, and young adult fiction. The linear regression analysis that each genre reacts differently to pricing strategies. Non-fiction genres tend to sell less as the price increases, as indicated by a negative coefficient of -23.6300 and a significant p-value. On the other hand, young adult (YA) fiction shows a positive relationship with sales price, with a coefficient of 52.9727, it means that there is an increase in sales with higher prices.
scatter_1
## `geom_smooth()` using formula = 'y ~ x'
scatter_2
## `geom_smooth()` using formula = 'y ~ x'
To visualize the results I also use scatter plots with regression lines for the total number of reviews and daily sales by genre which reveals a positive trend. It indicating that as the number of reviews increases, so do the daily sales. This pattern holds true across different genres, it explains that that customer engagement through reviews is a consistent predictor of sales performance. The first plot illustrates that for young adult fiction (YA_fiction), there is a noticeable decline in sales as the price increases. The second plot, showing sales versus total reviews by genre, demonstrates a robust positive trend across all genres, indicating that irrespective of genre, books with more reviews tend to have higher sales.
In addition to regression in genre, the ANOVA and Emmeans results was also conducted. The ANOVA results are highly significant (p < 2.2e-16), illustrate that the interaction between sale price and genre plays a crucial role in daily sales. The interaction between sale price and genre has a significantly lower RSS than the model with sale price alone , indicating a better fit for the data when genre is considered. On the other hand, (EMM) analysis provides additional insights into how the sale price impacts sales within each genre. For adult fiction, the estimated mean daily sales at a sale price of 10.3 is 80.9. For non-fiction, the estimated mean is lower at 63.8. Also, for young adult fiction, the estimated mean is the highest at 104.7. This suggests that the impact of sale price on sales is depend on genre, with young adult fiction being less sensitive to price changes compared to adult fiction and non-fiction.
In conclusion, the findings suggest that the total reviews are a strong predictor of book sales. on the other hand average review scores do not have a significant direct impact when considered in isolation. The price of a book does affect its sales, and this effect is moderated by the book’s genre.